{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ex2 - Getting and Knowing your Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This time we are going to pull data directly from the internet.\n",
    "Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.\n",
    "\n",
    "### Step 1. Import the necessary libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:58.136780Z",
     "start_time": "2023-12-28T14:04:57.986723Z"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3. Assign it to a variable called chipo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.537470Z",
     "start_time": "2023-12-28T14:04:58.005871Z"
    }
   },
   "outputs": [],
   "source": [
    "chipo = pd.read_csv(\"https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv\",sep=\"\\t\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 4. See the first 10 entries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false,
    "scrolled": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.546280Z",
     "start_time": "2023-12-28T14:04:59.538546Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   order_id  quantity                              item_name  \\\n",
      "0         1         1           Chips and Fresh Tomato Salsa   \n",
      "1         1         1                                   Izze   \n",
      "2         1         1                       Nantucket Nectar   \n",
      "3         1         1  Chips and Tomatillo-Green Chili Salsa   \n",
      "4         2         2                           Chicken Bowl   \n",
      "5         3         1                           Chicken Bowl   \n",
      "6         3         1                          Side of Chips   \n",
      "7         4         1                          Steak Burrito   \n",
      "8         4         1                       Steak Soft Tacos   \n",
      "9         5         1                          Steak Burrito   \n",
      "\n",
      "                                  choice_description item_price  \n",
      "0                                                NaN     $2.39   \n",
      "1                                       [Clementine]     $3.39   \n",
      "2                                            [Apple]     $3.39   \n",
      "3                                                NaN     $2.39   \n",
      "4  [Tomatillo-Red Chili Salsa (Hot), [Black Beans...    $16.98   \n",
      "5  [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...    $10.98   \n",
      "6                                                NaN     $1.69   \n",
      "7  [Tomatillo Red Chili Salsa, [Fajita Vegetables...    $11.75   \n",
      "8  [Tomatillo Green Chili Salsa, [Pinto Beans, Ch...     $9.25   \n",
      "9  [Fresh Tomato Salsa, [Rice, Black Beans, Pinto...     $9.25   \n"
     ]
    }
   ],
   "source": [
    "print(chipo.head(10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 5. What is the number of observations in the dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.550428Z",
     "start_time": "2023-12-28T14:04:59.543486Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4622\n",
      "4622\n"
     ]
    }
   ],
   "source": [
    "# Solution 1\n",
    "print(len(chipo))\n",
    "print(chipo.shape[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.560262Z",
     "start_time": "2023-12-28T14:04:59.552226Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4622\n"
     ]
    }
   ],
   "source": [
    "# Solution 2\n",
    "print(chipo.shape[0])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 6. What is the number of columns in the dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.613Z",
     "start_time": "2023-12-28T14:04:59.556473Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5\n"
     ]
    }
   ],
   "source": [
    "print(chipo.shape[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 7. Print the name of all the columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.635841Z",
     "start_time": "2023-12-28T14:04:59.563539Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['order_id' 'quantity' 'item_name' 'choice_description' 'item_price']\n"
     ]
    }
   ],
   "source": [
    "print(chipo.columns.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 8. How is the dataset indexed?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.636711Z",
     "start_time": "2023-12-28T14:04:59.569785Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[   0    1    2 ... 4619 4620 4621]\n"
     ]
    }
   ],
   "source": [
    "print(chipo.index.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 9. Which was the most-ordered item? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:16:03.612031Z",
     "start_time": "2023-12-28T14:16:03.588136Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              order_id  quantity  \\\n",
      "item_name                          \n",
      "Chicken Bowl    713926       761   \n",
      "\n",
      "                                             choice_description  item_price  \n",
      "item_name                                                                    \n",
      "Chicken Bowl  [Tomatillo-Red Chili Salsa (Hot), [Black Beans...     7342.73  \n"
     ]
    }
   ],
   "source": [
    "c = chipo.groupby('item_name')\n",
    "c = c.sum()\n",
    "c = c.sort_values(['quantity'], ascending=False)\n",
    "print(c.head(1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 10. For the most-ordered item, how many items were ordered?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.638020Z",
     "start_time": "2023-12-28T14:04:59.582567Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": "726"
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chipo['item_name'].value_counts().max()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 11. What was the most ordered item in the choice_description column?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.638292Z",
     "start_time": "2023-12-28T14:04:59.592264Z"
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 12. How many items were orderd in total?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.638640Z",
     "start_time": "2023-12-28T14:04:59.595621Z"
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 13. Turn the item price into a float"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step 13.a. Check the item price type"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step 13.b. Create a lambda function and change the type of item price"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": true,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.638908Z",
     "start_time": "2023-12-28T14:04:59.603719Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "order_id                int64\n",
      "quantity                int64\n",
      "item_name              object\n",
      "choice_description     object\n",
      "item_price            float64\n",
      "dtype: object\n"
     ]
    }
   ],
   "source": [
    "#chipo['item_price'] = chipo['item_price'].str.replace(\"$\",\"\").astype(float)\n",
    "chipo['item_price'] = chipo['item_price'].apply(lambda x: float(x.replace('$','')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Step 13.c. Check the item price type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:05:25.399109Z",
     "start_time": "2023-12-28T14:05:25.390468Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "float64\n"
     ]
    }
   ],
   "source": [
    "print(chipo['item_price'].dtypes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 14. How much was the revenue for the period in the dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.640141Z",
     "start_time": "2023-12-28T14:04:59.611023Z"
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 15. How many orders were made in the period?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.640541Z",
     "start_time": "2023-12-28T14:04:59.615394Z"
    }
   },
   "outputs": [],
   "source": [
    "chipo['order_id']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 16. What is the average revenue amount per order?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.640704Z",
     "start_time": "2023-12-28T14:04:59.621331Z"
    }
   },
   "outputs": [],
   "source": [
    "# Solution 1\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.653038Z",
     "start_time": "2023-12-28T14:04:59.625701Z"
    }
   },
   "outputs": [],
   "source": [
    "# Solution 2\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 17. How many different items are sold?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2023-12-28T14:04:59.769642Z",
     "start_time": "2023-12-28T14:04:59.632151Z"
    }
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "name": "python3",
   "language": "python",
   "display_name": "Python 3 (ipykernel)"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
